home *** CD-ROM | disk | FTP | other *** search
- This is the second release of these programs. The first release (packaged
- by Udi Finkelstein) included only the m68k programs.
-
- This is my implementation of integer real fft computation. It was
- designed for high speed analysis of audio input in real time on a
- PC/AT. It has support for generic C and optimized assembly for:
- - intel 80x86
- - motorola 68k
- - natsemi 32k
- The assembler is usualy 4-8 times faster than the generic C so it is worth
- using.
-
- The data used was 8bit but the routine does 16bit signed arithmetic.
- The algorithm of choice was one which minimizes mults while keeping the
- number of needed registers to a minimum (the Intel chip does not have
- enough). Data if gradualy scaled in flight to avoid overflow.
-
- A standard (2,4) split-radix was used as described in [Sorensen et al.,
- IEEE ASSP-35 no 6., June 87, pp. 849-863]. Other algoriths (WFT etc.)
- where found to use too complex arithmetic for efficient implementation
- on a machine with a decent mult instruction because register spills slow
- it more than the reduce mults speed it.
-
- WHAT IT DOES NOT DO:
-
- - NO inverse fft.
- - NO complex input.
- - NO windowing.
- - output is power spectrum WITHOUT the square root taken.
- - the sample data array (x[]) is not descramled so you cannot use it directly.
- see how the power spectrum is derived if you want to generate phase or
- whatever. fft7() and fft8() function do this post-processing.
-
- The method of implementaion was to have a program (fftg.c) that
- generates the fft routine which is then compiled/assembled and used.
- The generated program is assembly, but for portability this release
- generates a C program too. I have routines for Intel assembly and ns32k
- assembly, other machines can have the basic routines re-written without
- much effort (but for best performance one would optimize these routines
- extensively).
-
- The archive contains a sample program (fft2.c) that uses the fft
- routine. It draws some basic picture on a screen using move() and
- draw() function which you will have to supply (it uses the microsoft c
- graphics library in hercules mode using msherc.com). To run the test
- program: 'fft2a' or 'fft2c' or 'fft2f'. you can try 'fft2a x' which
- will call fft 10000 times and display the time, then do 'fft2a x x' to
- measure the overhead of the loop, then a result of (e.g.) 29 from the
- first and 3 from the second means an average time of 2.6ms/fft. When
- the program asks for a data file respond 'd1' of 'd2' etc. or 'end' to
- stop.
-
- Originaly the program was a BASIC program from some British mag a few
- years ago which started my interest; it took a few minutes to do one
- fft. The current version does a 256 point fft on a 25MHz ns32532 in
- 2.6ms(asm)/4.2ms(c)/5.8ms(funcs), on a 10MHz 80286 it takes 7ms(asm, it
- has a rather fast int mult)/38.8ms(c)/44.4ms(funcs). As you see, gcc on
- the ns32k compares better to hadcrafted assembly than msc5.1.
-
- I now use a 20MHz 80286. If you sample at 44Khz, 256 points take 5.8ms
- to sample but only 2.9ms to do the fft (assuming the sampling is done
- via DMA, inerrupts would eat too much cpu time), so you have 2.9ms for
- other stuff (display? this is slow unless you really try. I wrote my
- low level display routines or the fft gain would be lost). On a 386/486
- you are laughing.
-
- The package contains one ready fft() in the file fft8c.f; it does a 256
- point fft in the slowest way but it should compile readily.
-
- A program to measure the time is also supplied as fft1.c. You will need to
- link the following:
- fft1.o fft8f.o fftsubs.o
- then run the result to see how fast it goes. Be patient, the test takes about
- one minute. On some machine (where 'mult' has variable time) this test will
- give only the typical speed.
-
- To get the generator programs 'make progs'. A generator program is built
- from two modules: fft.g (the core) and fftout?.c (the output driver). You
- should choose one of the drivers or write your own if there is no support
- for your machine (in which case I would like to hear from you). The driver
- fftoutf.c generates a C program that calls functions in fftsubs.c; this
- should work on every system.
-
- Once you have the generator, you need to run it and generate the fft program:
- fftgf fft8f 8 fft
- this will build a 256 (2^8) points fft() function into fft8f.c. Now compile
- and then link into your application.
-
- All of the above is done in the makefile, look at it.
-
- To get the fft programs 'make fft'. The 'c' version uses fftsubs.h. The
- 'f' version uses C function calls which is a bit slower but uses far
- less memory; it uses fftsubs.c. The '86' version uses fftsubs.mac, Note
- that the makefile generates a 256 point routine with entry point fft(),
- you may want to change it for your needs or just edit the output.
-
- fftouts.c generates ns32k assembly, fftouta.c generates Intel assembly.
- These should give a good idea of how to port the assembly level
- routines to another machine.
-
- To get the test programs 'make tests' (but first fix fft2.c for the
- graphics if needed).
-
- MSDOS NOTE: The c output has to be split into smaller files because a)
- the compiler wont optimize a large prog b) the compiler dies on any
- larger progs, and c) the code segment would be too large. The output
- routines in fftoutc.c will break the output every 1000 statements. This
- is fine for msc5.1, other compilers may cope with larger programs, so
- just up the limit in break_fft().
-
- HOW TO MODIFY FFT:
-
- The program to modify is fftg.c and the related fftout?.c that you use.
- You would probably modify the latter.
-
- In general, the calls to fft1...fft5 handle the fft proper while
- fft7+fft8 do the final power spectrum and reordering. The last two is
- where you extract the results you wanted from x[] into qf[]. x[] holds
- the real and complex parts in the order:
- xx[0-255]= R[0],...,R[128],I[127],...,I[1]
- but in this program xx[i] is in position x[mm[i]].
-
-
- HOW TO CALL FFT:
-
- In your program header include:
- short near x[256] = {0}; /* some compilers need to initialize */
- short near qf[128+1] = {0};
- extern far fft (); /* or whatever you called it */
- Then in the program:
- /* read the data points into x[0-255]*/
- fft ();
- /* the power spectrum is in qf[0-128], qf[0] is the DC componnent */
- Thats all.
-
- Please let me know of problems/difficulties you encounter with these
- programs as well as any suggestions/improvements.
-
- Regards
- Eyal Lebedinsky eyal@ise.canberra.edu.au
-